82
already correctly reflect the different activities of the metabolic pathways (which is only
true on statistical average or for sufficiently large networks).
Finally, even the semiquantitative models for signal modeling use heuristics, in particu
lar the kinetics is estimated only from the Boolean networks of the process to be modeled.
This allows me to get started with such a model when little is known in detail about the
speed and nature of the proteins, enzymes, kinases, etc. involved.
How can you now program a heuristic search yourself?
The BioPerl and Biojava modules (https://bioperl.org/, https://biojava.org/) at the EBI
(European Bioinformatic Institute) are a good way to quickly program a heuristic search
or even a simple program or a larger program composed of simple parts. They provide
ready-written modules (program parts) for reading, output, but also for web servers or
database searches for the user. The PERL Cookbook (Christiansen and Torkington 2003)
offers a lot of tips for concrete implementation with the PERL programming language.
Even more tips are found in further publications (Angly et al., 2014; Vos et al., 2011;
Stajich et al., 2002; Tisdal et al., 2001).
For calculations, the book “numerical recipies” (https://numerical.recipes) is a real trea
sure trove. Originally a book (Press et al., 2007), it now explains online in a clear way how I
can quickly and easily compute small calculations or even surprisingly complex ones, which,
however, come up again and again in many problems. Similar to a cooking recipe, the prin
ciples are explained and codes are provided, for example to make a Matlab code run faster
(tutorial: https://numerical.recipes/nr3_matlab.html) or to use a “C+ +” code for even faster
calculations instead. Examples of applications for these numerical recipes, also in bioinfor
matics, are e.g. efficient matrix and vector calculations (calculate metabolic fluxes effi
ciently), but also routines for geometric tasks (calculate protein structures) or the generation
of random numbers (for population simulations in ecology).
Conclusion
In this chapter we have tried to look a little behind the façade of the fast bioinformat
ics programs on the net, such as the BLAST server at the NCBI (National Center of
Biotechnology Information) in Washington. In most cases, you can get an answer in
seconds to a few minutes. This is made possible by fast but not entirely accurate
searches (heuristics), and we have seen some tricks for doing this. For example, in
BLAST, the heuristic is to first find two short but perfect match alignments in the same
database entry before I check over the whole sequence length to see what the similarity
is to the question sequence.
It is equally important to make the database (e.g. GenBank, UniProt) quickly read
able, for example by indexing it (after all, you look up this book much more quickly via
the table of contents than by leafing through it). In addition to speed, sensitivity (do I
recognise all relevant entries?) and specificity (do I not get too many non-relevant
entries?) are also important for a good search.
6 Extremely Fast Sequence Comparisons Identify All the Molecules That Are Present…